3  Python essentials - Plotting data

Open In Colab

Install dependencies

We will first have to make sure that the required libraries are installed.

How are python modules installed? Usually, through a package manager like pip or conda.

How do I find what line to run to install a specific package? You google the package and look at the install section in the documentation.

For example, if we want to install matplotlib, we would search for matplotlib on google, find https://matplotlib.org/stable/index.html and run the command pip install matplotlib. Because this is a shell command, you have to run it with an !.

!pip install matplotlib

Similar for seaborn, you would google the library, find https://seaborn.pydata.org, look at installing, and run pip install seaborn.

!pip install seaborn

matplotlib and seaborn

matplotlib and seaborn are both popular plotting libraries in Python.

matplotlib is a low-level plotting library that allows you to create a wide variety of plots, including line plots, scatter plots, bar plots, histograms, and more. matplotlib has a lot of customization options, which can make it a bit more difficult to use than other plotting libraries, but it gives you the flexibility to create almost any kind of plot you need.

Here’s a simple example of how you could use matplotlib to create a line plot:

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y)
plt.xlabel("X Axis")
plt.ylabel("Y Axis")
plt.title("Line Plot")

seaborn is a higher-level plotting library built on top of matplotlib that makes it easier to create beautiful and informative statistical plots. seaborn has a lot of built-in functions for plotting commonly used statistical plots, such as violin plots, box plots, and heatmaps, which can save you a lot of time and make your plots look more professional.

Here’s a simple example of how you could use seaborn to create a scatter plot:

import seaborn as sns

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

sns.scatterplot(x=x, y=y)
plt.xlabel("X Axis")
plt.ylabel("Y Axis")
plt.title("Scatter Plot")

Those are just simple plots, but you can check out seaborn gallery for inspiration of more advanced plots.

Plotting the ESOL data

Let’s load the delaney data again. We can also do it directly from the URL.

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/schwallergroup/ai4chem_course/main/notebooks/01%20-%20Basics/data/delaney-processed.csv" )

df.describe() # prints summary statistics for each column

Pandas directly lets us plot starting from the DataFrame.

# Plot 'measured log solubility in mols per litre' vs 'Molecular Weight'
df.plot(x='Molecular Weight', y='measured log solubility in mols per litre', kind='scatter')
# Plot a histogram of 'ESOL predicted log solubility in mols per litre'
df['ESOL predicted log solubility in mols per litre'].plot(kind='hist')
# Scatter plot with regression line
sns.regplot(x='Molecular Weight', y='measured log solubility in mols per litre', data=df)
# Joint plot with histograms on the sides
sns.jointplot(x='Number of Rings', y='Number of H-Bond Donors', data=df)
# Box plot to show distribution of 'Polar Surface Area'
sns.boxplot(x='Polar Surface Area', data=df)
# Violin plot to show the distribution of 'Number of Rotatable Bonds'
sns.violinplot(x='Number of Rotatable Bonds', data=df)
plt.show()
# Pair plot to visualize the relationship between multiple columns
sns.pairplot(df[['Number of H-Bond Donors', 'Molecular Weight', 'measured log solubility in mols per litre']])
plt.show()